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Abstract This paper proposes FMAP (Forward Multi- 
Agent Planning), a fully-distributed multi-agent plan¬ 
ning method that integrates planning and coordina¬ 
tion. Although FMAP is specifically aimed at solving 
problems that require cooperation among agents, the 
flexibility of the domain-independent planning model 
allows FMAP to tackle multi-agent planning tasks of 
any type. In FMAP, agents jointly explore the plan 
space by building up refinement plans through a com¬ 
plete and flexible forward-chaining partial-order plan¬ 
ner. The search is guided by Hdtg , a novel heuris¬ 
tic function that is based on the concepts of Domain 
Transition Graph and frontier state and is optimized 
to evaluate plans in distributed environments. Agents 
in FMAP apply an advanced privacy model that allows 
them to adequately keep private information while com¬ 
municating only the data of the refinement plans that 
is relevant to each of the participating agents. Exper¬ 
imental results show that FMAP is a general-purpose 
approach that efficiently solves tightly-coupled domains 
that have specialized agents and cooperative goals as 
well as loosely-coupled problems. Specifically, the em¬ 
pirical evaluation shows that FMAP outperforms cur¬ 
rent MAP systems at solving complex planning tasks 
that are adapted from the International Planning Com¬ 
petition benchmarks. 
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1 Introduction 

Multi-agent planning (MAP) introduces a social ap¬ 
proach to planning by which multiple intelligent entities 
work together to solve planning tasks that they are not 
able to solve by themselves, or to at least accomplish 
them better by cooperating [40]. MAP places the focus 
on the collective effort of multiple agents to accomplish 
tasks by combining their knowledge and capabilities. 

The complexity of solving a MAP task directly de¬ 
pends on its typology. In order to illustrate the features 
of a MAP task, let us introduce a brief application ex¬ 
ample. 

Example 1 Consider the transportation task in Fig. [lj 
which involves three different agents. There are two 
transport agencies (tal and £a2), each of which has a 
truck (tl and £2, respectively). The two agencies work 
in two different geographical areas, gal and pa2, respec¬ 
tively. The third agent is a factory, /, which is placed 
in the area ga2. To manufacture products, factory / re¬ 
quires raw materials (rm) that are gathered from area 
pal. In this task, tal and ta2 have the same capabilities, 
but they act in different areas; i.e., they are spatially 
distributed agents. Additionally, the factory agent / is 
functionally different from tal and ta2. The goal of this 
task is for / to manufacture a set of final products. In 
order to carry out the task, tal will send its truck tl to 
load the raw materials rm located in 12 and then trans¬ 
port them to a storage facility (sf) that is placed in the 
intersection of both geographical areas. Then, ta2 will 
complete the delivery by using its truck t2 to transport 
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the materials from sf to /, which will in turn manu¬ 
facture the final products. Therefore, this task involves 
three specialized agents that are spatially and function¬ 
ally distributed which must cooperate to accomplish a 
common goal. 

Example [l] emphasizes most of the key elements of 
a MAP task. First, the spatial and/or functional distri¬ 
bution of planning agents gives rise to specialized agents 
that have different knowledge and capabilities. In turn, 
this information distribution stresses the issue of pri¬ 
vacy /, which is one of the basic aspects that should be 
considered in multi-agent applications [33]. 

Since the three parties involved in Example [l] are 
specialized in different functional or geographical ar¬ 
eas of the task, most of the information managed by 
factory / is not relevant for the transport agencies and 
vice-versa. The same occurs with the transport agencies 
tal and ta2. Additionally, agents may not be willing to 
share the sensitive information of their internal proce¬ 
dures with the others. For instance, tal and ta2 are 
cooperating in this particular delivery task, but they 
might be potential competitors since they work in the 
same business sector. Therefore, agents in a MAP con¬ 
text want to minimize the information they share with 
each other, either for strategic reasons or simply be¬ 
cause it is not relevant for the rest of the agents in 
order to address the planning task. 

Besides the need for computational or information 
distribution, privacy is also one of the reasons to adopt 
a multi-agent approach. This aspect, however, has been 
traditionally relegated in MAP, particularly by the plan¬ 
ning community ED- While some approaches define a 
basic notion of privacy EH2SL others allow agents to 
share detailed parts of their plans or do not take pri¬ 
vate information into account at all [22] . 

The complexity of a MAP task is often described 
by means of its coupling level [4], which is measured 
as the number of interactions that arise among agents 
during the resolution of a MAP task. According to this 
parameter, MAP tasks can be classified into loosely- 
coupled tasks (which present few interactions among 
agents) and tightly-coupled tasks (which involve many 
interactions among agents). The coupling level, how¬ 


ever, does not take into consideration one key aspect of 
MAP tasks: the presence of cooperative goals ; i.e., goals 
that cannot be solved individually by any agent since 
they require the cooperation of specialized agents. Ex¬ 
ample [l] illustrates a tightly-coupled task with one such 
goal since none of the agents can achieve the manu¬ 
facturing of the final products by itself. Instead, they 
must make use of their specialized capabilities and in¬ 
teract with each other to deliver the raw materials and 
manufacture the final products. 

In this paper, we present FMAP (Forward MAP), 
which is a domain-independent MAP system that is de¬ 
signed to cope with a great variety of planning tasks of 
different complexity and coupling level. FMAP is a fully 
distributed method that interleaves planning and coor¬ 
dination by following a cooperative refinement planning 
approach. This search scheme allows us to efficiently 
coordinate agents’ actions in any type of planning task 
(either loosely-coupled or tightly-coupled) as well as to 
handle cooperative goals. 

FMAP relies on a theoretical model which defines 
a more sophisticated notion of privacy than most of 
the existing MAP systems. Instead of using a single 
set of private data, FMAP allows agents to declare 
the information they will share with each other. For 
instance, the transport agency ta2 in Example 1 will 
share with factory / information that is likely to be 
different from the information shared with agent tal. 
Our system enhances privacy by minimizing the infor¬ 
mation that agents need to disclose. FMAP is a com¬ 
plete and reliable planning system that has proven to be 
very competitive when compared to other state-of-the- 
art MAP systems. The experimental results will show 
that FMAP is particularly effective for solving tightly- 
coupled MAP problems with cooperative goals. 

This article is organized as follows: section[2]presents 
some related work on multi-agent planning, with an 
emphasis on issues like the coupling level of planning 
tasks, privacy, or cooperative goals. Section [3] formal¬ 
izes the notion of a MAP task; section [4] describes the 
main components of FMAP, the search procedure, and 
the DTG-based heuristic function; finally, sect ion [5] pro¬ 
vides a thorough experimental evaluation of FMAP and 
section [6] concludes the paper. 


2 Related work 

In the literature, there are two main approaches for 
solving MAP tasks like the one described in Exam- 
Ple[[] Centralized MAP involves using an intermediary 
agent that has complete knowledge of the task. The dis¬ 
tributed or decentralized approach spreads the planning 








FMAP: Distributed Cooperative Multi-Agent Planning 


3 


responsability among agents, which are in charge of in¬ 
teracting with each other to coordinate their local so¬ 
lutions, if necessary |28[ I18|. The adoption of a central¬ 
ized approach is aimed at improving the planner perfor¬ 
mance by taking advantage of the inherent structure of 
the MAP tasks [22115] . Centralized approaches assume 
a single planning entity that has complete knowledge 
of the task, which is rather unrealistic if the parties 
involved in the task have sensitive private information 
that they are not willing to disclose (32j. In Example 
[TJ the three agents involved in the task want to protect 
the information regarding their internal processes and 
business strategies, so a centralized setting is not an 
acceptable solution. 

We then focus on fully distributed MAP, that is, the 
problem of coordinating agents in a shared environment 
where information is distributed. The distributed MAP 
setting involves two main tasks: the planning of local 
solutions and the coordination of the agents’ plans into 
a global solution. Coordination can be performed at 
one or various stages of the distributed resolution of a 
MAP task. Some techniques are used for problems in 
which agents build local plans for the individual goals 
that they have been assigned. MAP is about coordinat¬ 
ing the local plans of agents so as to mutually benefit 
by avoiding the duplication of effort. In this case, the 
goal is not to build a joint plan among entities that are 
functionally or spatially distributed but rather to apply 
plan merging to coordinate the local plans of multiple 
agents that are capable of achieving the problem goals 
by themselves 0. 

There is a large body of work on plan-merging tech¬ 
niques. The work in [7 introduces a distributed co¬ 
ordination framework based on partial-order planning 
that addresses the interactions that emerge between the 
agents’ local plans. This framework, however, does not 
consider privacy. The proposal in [36] is based on the it¬ 
erative revision of the agents’ local plans. Agents in this 
model cooperate by mutually adapting their local plans, 
with a focus on improving their common or individual 
benefit. This approach also ignores privacy and agents 
are assumed to be fully cooperative. The approach in 
[39] uses multi-agent plan repair to solve inconsisten¬ 
cies among the agents’ local plans while maintaining 
privacy. /i-SATPLAN [9] extends a satisfiability-based 
planner to coordinate the agents’ local plans by study¬ 
ing positive and negative interactions among them. 

Plan-merging techniques are not very well suited for 
coping with tightly-coupled tasks as they may introduce 
exponentially many ordering constraints in problems 
that require great coordination effort [7j. In general, 
plan merging is not an effective method for attaining 
cooperative goals since this resolution scheme generally 


assumes that each agent is able to solve a subset of the 
task’s goals by itself. However, some approaches use 
plan merging to coordinate local plans of specialized 
agents. In this case, the effort is placed on discovering 
the interaction points among agents through the pub¬ 
lic information that they share. For instance, Planning 
First [25 introduces a cooperative MAP approach for 
loosely-coupled tasks, in which specialized agents carry 
out planning individually through a state-based plan¬ 
ner. The resulting local plans are then coordinated by 
solving a distributed Constraint Satisfaction Problem 
(CSP) [16]. This combination of CSP and planning to 
solve MAP tasks was originally introduced by the MA- 
STRIPS framework [4]. 

Another major research trend in MAP interleaves 
planning and coordination, providing a more unified vi¬ 
sion of cooperative MAP. One of the first approaches 
to domain-independent MAP is the Generalized Par¬ 
tial Global Planning (GPGP) framework [23]. Agents 
in GPGP have a partial view of the world and commu¬ 
nicate their local plans to the rest of the agents, which 
in turn merge this information into their own partial 
global plan in order to improve it. Approaches to con¬ 
tinual planning (interleaving planning and execution in 
a world undergoing continual change), assume there is 
uncertainty in the world state and therefore agents do 
not have a complete view of the world [5]. Specifically in 
[5], agents have a limited knowledge of the environment 
and limited capabilities, but the authors do not explic¬ 
itly deal with a functional distribution among agents 
or cooperative goals. TFPOP is a fully centralized ap¬ 
proach that combines temporal and forward-chaining 
partial-order planning to solve loosely-coupled MAP 
tasks [22] . The Best-Response Planning algorithm de¬ 
parts from an initial joint plan that is built through 
the Planning First MAP system [25] and iteratively im¬ 
proves the quality of this initial plan by applying cost 
optimal planning HU. Agents can only access the pub¬ 
lic information of the other agents’ plans thereby pre¬ 
serving privacy, and they optimize their plans with the 
aim to converge to a Nash equilibrium regarding their 
preferences. MAP-POP is a fully distributed method 
that effectively maintains the agents’ privacy [MEF]. 
Agents in MAP-POP perform an incomplete partial- 
order planning search to progressively develop and co¬ 
ordinate a joint plan until its completion. 

Finally, MAPR is a recent planner that performs 
goal allocation to each agent |2[. Agents iteratively solve 
the assigned goals by extending the plan of the previ¬ 
ous agent. In this approach, agents work under lim¬ 
ited knowledge of the environment by obfuscating the 
private information in their plans. MAPR is particu¬ 
larly effective for loosely-coupled problems, but it can- 



4 


Alejandro Torreno et al. 


not deal with tasks that feature specialized agents and 
cooperative goals since it assumes that each goal is 
achieved by a single agent. Section [5] will show a com¬ 
parative performance evaluation between MAPR and 
FMAP, our proposed approach. 

3 MAP task formalization 

Agents in FMAP work with limited knowledge of the 
planning task by assuming that information that is not 
represented in an agent’s model is unknown to the agent. 
The states of the world are modeled through a finite 
set of state variables , V, each of which is associated to 
a finite domain, V v , of mutually exclusive values that 
refer to the objects in the world. Assigning a value d to 
a variable v G V generates a fluent. A positive fluent is 
a tuple (v , d), which indicates that the variable v takes 
the value d. A negative fluent is of the form d), 
indicating that v does not take the value d. A state S 
is a set of positive and negative fluents. 

An action is a tuple a = ( PRE(a ), EFF(a )), where 
PRE(a) is a finite set of fluents that represents the pre¬ 
conditions of and EFF(a) is a finite set of positive 
and negative variable assignments that model the ef¬ 
fects of a. Executing an action a in a world state S 
leads to a new world state S' as a result of applying 
EFF(a) over S. An effect of the form (v = d) assigns 
the value d to the variable v, i.e., it adds the fluent 
(v, d) to S' as well as adding a set of fluents (v, -i d') 
for each other value d! in the variable domain in order 
to have a consistent state representation. Additionally, 
any fluent in S of the form (v , d) or (v, d"), d" ^ d, is 
removed in state S' . This latter modification removes 
any fluent that contradicts (v,d). On the other hand, 
an assignment (v ^ d) adds the fluent (v, -id) to S' and 
removes (v, d) from S', if such a fluent exists in S. 

For instance, let us suppose that the transportation 
task in Example [l] includes a variable pos-rm that de¬ 
scribes the position of the raw materials m, which can 
be any of the locations in the task. Let S be a state that 
includes a fluent (pos-rm, 12), which indicates that rm 
is placed in its initial location (see Fig. [l]). Agent tal 
performs an action to load rm into its truck £1, which 
includes an effect of the form [pos-rm = tl). The ap¬ 
plication of this action results in a new world state S' 
that will include a fluent ( pos-rm, tl) and fluents of the 
form (pos-rm, A) for each other location l ^ tl; the 
fluent ( pos-rm, 12) will no longer be in S' . 

Definition 1 A MAP task is defined as a tuple Tmap = 
(AQ,V,T, Q, A). AQ = {1,..., n} is a finite non-empty 
set of agents. V = (J ieAG w ^ ere T 2 * s th e se ^ °f state 
variables known to an agent i. 1 = U ieAG^ a se ^ 


of fluents that defines the initial state of Tmap • Since 
specialized agents are allowed, they may only know a 
subset of T. Given two agents i and j, T l D X- 7 may 
or may not be 0; in any case, the initial states of the 
agents never contradict each other. Q is the set of goals 
of Tmap , i.e., the values of the state variables that 
agents have to achieve in order to accomplish Tmap • 
Finally, A = (J ieAG ^ set of planning actions of 

the agents. A 1 and A 7 of two specialized agents i and j 
will typically be two disjoint sets since the agents have 
their own different capabilities; otherwise, A 1 and A 7 
may overlap. A includes two fictitious actions and 
Oij that do not belong to the action set of any par¬ 
ticular agent: ol{ represents the initial state of Tmap , 
i.e., PRE(ai) = 0 and EFF(ai) = X, while otf repre¬ 
sents the global goals of Tmap , i.e., PRE(af) = Q , and 
EFF(af) = 0. 

As discussed in Example [l] our model considers spe¬ 
cialized agents that can be functionally and/or spatially 
distributed. This specialization defines the local view 
that each agent has of the MAP task. Local views are a 
typical characteristic of multi-agent systems and other 
distributed systems. For instance, distributed CSPs use 
local views, such that agents only receive information 
about the constraints in which they are involved PH 
SI]. Next, we define the information of an agent i on a 
planning task Tmap • 

The view of an agent i on a MAP task Tmap is de¬ 
fined as Tm AP = (V 2 , A l ,T l , Q). V 1 is the set of state 
variables known to agent i; A 1 C A is the set of its 
capabilities (planning actions); T l is the subset of flu¬ 
ents of the initial state X that are visible to agent i; 
and Q is the set of global goals of Tmap • Since agents 
in FMAP are fully cooperative, they are all aware of 
the global goals of the task. Obviously, because of spe¬ 
cialization, a particular agent may not understand the 
goals as specified in Q; defining Q as global goals im¬ 
plies that all agents contribute to the achievement of Q , 
either directly (achieving a g £ G) or indirectly (intro¬ 
ducing actions whose effects help other agents achieve 
9 )■ 

The state variables of an agent i are determined 
by the view the agent has on the initial state, P , the 
planning actions it can perform, A 1 , and the set of goals 
of Tmap • This also affects the domain D v of a variable 
v. We define V l v C V v as the set of values of the variable 
v that are known to agent i. 

Consider again the pos-rm variable in Example [l] 
The domain of pos-rm contains all the locations in 
the transportation task, including the factory /, the 
storage facility sf, and the trucks; that is, T> p 0 s-rm = 
{/l,/2,/3,/4,/, 5 /, £1, £2}. However, agents £al and ta2 
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have local knowledge about the domain of pos-rm be¬ 
cause some of the values of such variable refer to ob¬ 
jects of Tmap that are unknown to them. Hence, tal 
will manage = {/l, / 2 , s/, £ 1 }, while ta2 will 

manage = {^3, 14, sf, f, t2}. 

Agents in FMAP interact with each other by sharing 
information about their state variables. For each pair 
of agents i and j, the public information they share 
is defined as = V 1 D V J . Additionally, some 

of the values in the domain of a variable can also be 
public to both agents. The set of values of a variable v 
that are public to a pair of agents i and j is defined as 
V ij = / D i H V j . 

V V V 

As Example [l] indicates, the pos-rm variable is pub¬ 
lic to agents tal and ta2. The values that are public to 
both agents are defined as the intersection of the values 
that are known to each of them, = {sf}. This 

way, the only public location of rm for agents tal and 
ta2 is the storage facility s/, which is precisely the in¬ 
tersection between the two geographical areas. Hence, 
if agent tal places rm in s/, it will inform ta2 accord¬ 
ingly, and vice versa. This allows agents tal and ta2 to 
work together while minimizing the information they 
share with each other. 

Our MAP model is a multi-agent refinement plan¬ 
ning framework, which is a general method based on 
the refinement of the set of all possible plans. The in¬ 
ternal reasoning of agents in FMAP is configured as a 
Partial-Order Planning (POP) search procedure. Other 
local search strategies are applicable, as long as agents 
build partial-order plans. The following concepts and 
definitions are standard terms from the POP paradigm 
m, which have been adapted to state variables. Ad¬ 
ditionally, definitions also account for the multi-agent 
nature of the planning task and the local views of the 
task by the agents. 

Definition 2 A partial-order plan or partial plan is 
a tuple 77 = (A, OhZ,CC). A = {o|o E A} is the set of 
actions in 77. OR is a finite set of ordering constraints 
(-<) on A. CC is a finite set of causal links of the form 

a P or a ^ where a and f3 are actions in 

A. A causal link a ft enforces precondition {v,d) E 
PRE(P) through an effect {v = d) E EFF{a) fl2l . 

Similarly, a causal link a ft enforces (v , d) E 

PRE(ft) through an effect (v ^ d) E EFF(a) or (v = 
d') E EFF(a), d! ^ d. 

An empty partial plan is defined as 77q = (/Aq, ORq, 
C£o), where ORo and CCo are empty sets, and Aq con¬ 
tains only the fictitious initial action otf. A partial plan 
77 for a task Tmap will always contain cq. 


The introduction of new actions in a partial plan 
may trigger the appearance of flaws. There are two 
types of flaws in a partial plan: preconditions that are 
not yet solved (or supported) through a causal link, and 

threats. A threat over a causal link a ft is caused 
by an action 7 that is not ordered w.r.t. a or fl and 
might potentially modify the value of v [12] [{y ^ d) E 
EFF{ 7 ) or (v = d') E EFF( 7 ), d' 7 ^ d), making the 
causal link unsafe. Threats are addressed by introduc¬ 
ing either an ordering constraint 7 -< a (this is called 
demotion because the causal link is posted after the 
threatening action) or an ordering fl -< 7 (this is called 
promotion because the causal link is placed before the 
threatening action) [12]. 

A flaw-free plan is a threat-free partial plan in which 
the preconditions of all the actions are supported through 
causal links. 

Planning agents in FMAP cooperate to solve MAP 
tasks by progressively refining an initially empty plan 77 
until a solution is reached. The definition of refinement 
plan is closely related to the internal forward-chaining 
partial-order planning search performed by the agents. 
Refinement planning is a technique that is widely used 
by many planners, specifically in anytime planning, where 
a first initial solution is progressively refined until the 
deliberation time expires [51] . We define a refinement 
plan as follows: 

Definition 3 A refinement plan T7 r = (Z\ r , OR r , 
CC r ) over a partial plan 77 = (Z\, OR, CC) is a flaw-free 
partial plan that extends 77, i.e., A C A r , OR C OR r 
and CC C CC r . T7 r introduces a new action a E A r 
in 77, resulting in A r = A Ua. All the preconditions 
in PRE(a) are linked to existing actions in Ft through 
causal links; i.e., all preconditions are supported: \/p E 
PRE(a ), 3 fl A a E C£ r , where p E A. 

Refinement plans in FMAP include actions that can 
be executed in parallel by different agents. Some MAP 
models consider that two parallel or non-sequential ac¬ 
tions are mutually consistent if neither of them modi¬ 
fies the value of a state variable that the other relies on 
or affects [5]. We also consider that the preconditions 
of two mutually consistent actions have to be consis¬ 
tent [5]. Hence, two non-sequential actions aET and 
P E M 7 are mutually consistent if none of the following 
conditions hold: 

- 3(v » d) E EFF(a) and 3((v,d') E PRE(P) V 
(v, -.d) E PRE(p )), where v E V ij , d E X#, d! E 
and d <7', or vice versa; that is, the effects of a 
and the preconditions of P (or vice versa) do not 
contradict each other under the specified conditions. 
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— 3(v = d) G EFF(a) and 3((v = d!) G EFF(/3) V 
(v j ^ <7) G EFF(fJ)), where v G V ZJ , d G V l J , d! G 
Dj and d ^ d!, or vice versa; that is, the effects of a 
and the effects of /3 (or vice versa) do not contradict 
each other under the specified conditions. 

— 3(v, d) G PRE(a) and 3((v, d') G PRE(f3)V(v , -.d) G 
PRE((3)), where v G V*- 7 , d G £#', d! G Pj and 
d ^ d!, or vice versa; that is, the preconditions of 
a and the preconditions of /3 (or vice versa) do not 
contradict each other under the specified conditions. 

Agents address parallelism by the resolution of threats 
over the causal links of the plan. Thus, consistency be¬ 
tween any two non-sequential actions introduced by dif¬ 
ferent agents is always guaranteed as refinement plans 
are flaw-free plans. 

Finally, a solution plan for Tmap is a refinement 
plan II = (A, OR, CC) that addresses all the global 
goals Q of Tmap • A solution plan includes the ficti¬ 
tious final action OLf and ensures that all its precondi¬ 
tions (note that PRE(af) = Q) are satisfied; that is, 
Wg G PRE(af ), 3 fd A Oif G CC, /3 G A, which is the 
necessary condition to guarantee that 77 solves Tmap • 

3.1 Privacy in partial plans 

Every time an agent i refines a partial plan by introduc¬ 
ing a new action a G A 1 , it communicates the resulting 
refinement plan to the rest of the agents in Tmap • As 
stated above, the information that is public to a pair of 
agents is defined according to the common state vari¬ 
ables and domain values. In order to preserve privacy, 
agent i will only communicate to agent j the fluents in 
action a whose variables are common to both agents. 
The information of a refinement plan 77 that agent j 
receives from agent i configures its view of that plan, 
view^(II). More specifically, given two agents i and j 
and a fluent (v,d), where v G V 1 and d G T> l v (equiva¬ 
lently for a negative fluent (v } ->d)), we distinguish the 
three following cases: 

— Public fluent: if v G V 2 - 7 and d G Vfj , the fluent 
(v, d) is public to both i and j, and thus agent i will 
send agent j all the causal links, preconditions, and 
effects regarding (v,d). 

— Private fluent to agent i: if v 0 , the fluent 

(v,d) is private to agent i w.r.t. agent j, and thus 
agent i will occlude the preconditions and effects re¬ 
garding (v,d) to agent j. Causal links of the form 

(v,d) „ . . 

a -A p will be sent to agent j as ordering con¬ 
straints a -< p. 

— Partially private fluent to agent i: if v G V lJ but 

d 0 V l J , the fluent (v, d) is partially private to agent 


i w.r.t. agent j. Instead of (v,d), agent i will send 
agent j a fluent (v,~l~), where _L is the undefined 
value. Hence, preconditions of the form (v, d) will 
be sent as (v , _L), effects of the form (v = d) will be 

replaced by (v =_L), and causal links a (3 will 
adopt the form a /3. 

If an agent j receives a fluent (v, _L), T is interpreted 
as follows: V7 G Dj, (f,-i<7). That is, _L indicates that 
v is not assigned any of the values known to agent j 
(T7 7 ). This mechanism is used to inform an agent that 
a resource is no longer available in its influence area. For 
instance, suppose that agent ta2 in Example [l] acquires 
the raw material rm from sf by loading it into its truck 
t2. Agent ta2 communicates to tal that rm is no longer 
in sf , but agent tal does not know about the truck t2. 

To solve this issue, ta2 sends tal the fluent ( pos-rm , _L), 
meaning that the resource rm is no longer available in 
the geographical area of agent tal. Consequently, tal is 
now aware that rm is not located in any of its accessible 
positions = {11, 12, sf, tl}. 

Fig. [2] shows the view that the transport agents tal 
and ta2 in Example [l] have of a simple refinement plan 
77 r . In this plan, agent tal drives the truck tl from 
11 to 12 and loads rm into tl. As shown in Fig. Si, 
view tal (II r ) contains all the information of both ac¬ 
tions in the plan since agent tal has introduced them. 
Agent ta2, however, does not know about the truck 
tl, and hence the variable pos-tl, which models the 
position of tl, is private to tal w.r.t. ta2. This way, 
all the preconditions and effects related to the fluents 
(pos-tl, 11) and (pos-tl, 12) are occluded in view ta2 (II r ) 
(see Fig. It)- Additionally, the causal links regarding 
these two fluents are replaced by ordering constraints 
in view ta2 (77 r ). On the other hand, the variable pos-rm 
is public to both agents, but the load action refers to the 
locations tl and 12, which are not in 1 Therefore, 
fluents (pos-rm, 12) and (pos-rm, tl) are partially pri¬ 
vate to agent tal w.r.t. ta2. This way, in view ta2 (IJ r ), 
the precondition (pos-rm, 12) and the effect (pos-rm = 
tl) of the load action are replaced by (pos-rm, _L) and 
(pos-rm =_L), respectively. The fluent (pos-rm, 12) is 

it/ \ . i t i (pos-rm,12) 

also replaced by (pos-rm, _L) m the causal link -A 

load tl rm 12. 


3.2 MAP definition language 

There is a large body of work on planning task specifi¬ 
cation languages. Since planning has been traditionally 
regarded as a centralized problem, the most popular 
definition languages, such as the different versions of 
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Fig. 2 A refinement plan 77 r as viewed by: a) agent tal b) 
agent ta2 


PDDL (the Planning Domain Definition Languag^J, 
are designed to model single-agent planning tasks. MAP 
introduces a set of requirements that are not present 
in single-agent planning, such as privacy or specialized 
agents, which motivate the development of specification 
languages for multi-agent planning. 

There are many different approaches to MAP as de¬ 
scribed in section [2] MA-STRIPS [4], which was de¬ 
signed as a minimalistic extension to STRIPS HO], is 
one of the most common MAP languages. It allows 
defining a set of agents and associating the planning 
actions they can execute. FMAP presents several ad¬ 
vanced features that motivated the definition of our own 
PDDL -based specification language (the language syn¬ 
tax is detailed in f38j) rather than using MA-STRIPS. 

Since the world states in FMAP are modeled through 
state variables instead of predicates, our MAP language 
is based on PDDL3.1 [20], the latest version of PDDL. 
Unlike its predecessors, which model planning tasks 
through predicates, PDDL3.1 incorporates state vari¬ 
ables that map to a finite domain of objects of the task. 

In a single-agent language, the user specifies the do¬ 
main of the task (planning operators, types of objects, 
and functions) and the problem to be solved (objects of 
the task, initial state, and goals). In FMAP, we write a 
domain and a problem file for each agent, which define 
the typology of the agent, and the agent’s local view 
of the MAP task, respectively. The domain files keep 
the structure of a regular PDDL3.1 domain file. The 
problem files, however, are extended with an additional 
: shared-data section, which specifies the information 
that an agent can share with each of the other partici¬ 
pating agents in the task. 

4 FMAP refinement planning procedure 

FMAP is based on a cooperative refinement planning 
procedure in which agents jointly explore a multi-agent, 
plan-space search tree. A multi-agent search tree is one 

1 http://en.wikipedia.org/wiki/Planning_Domain_ 

Definition_Language 



in which the partial plans of the nodes are built with 
the contributions of one or more agents. 

Fig. [3] shows the first level of the multi-agent search 
tree that would be generated for the transportation task 
of Example [l] At this level, agents tal and ta2 each 
propose two refinement plans, specifically the plans to 
move their trucks within their geographical areas. In 
each of these refinement plans, the agent adds one ac¬ 
tion and the corresponding orderings and causal links. 
Agent / does not contribute here with any refinement 
plan because the initial empty plan 77o does not have 
the necessary supporting information for / to insert any 
of its actions. In a subsequent iteration (expansion of 
the next tree node), agents can in turn create new re¬ 
finement plans. For instance, if node iloo in Fig. [3]is se¬ 
lected next for expansion, the three agents in the prob¬ 
lem (tal, ta2, or /) will try to create refinement plans 
over 77oo by adding one of their actions and supporting 
it through the necessary causal links and orderings. 

Agents keep a copy of the multi-agent search tree, 
storing the local view they have of each of the plans 
in the tree nodes. Given a node 77 in the multi-agent 
search tree, an agent i maintains view 1 (77) in its copy 
of the tree. 

FMAP applies a multi-agent A* search that iter¬ 
atively explores the multi-agent tree. One iteration of 
FMAP involves the following: 1) agents select one of 
the unexplored leaf nodes of the tree for expansion; 2) 
agents expand the selected plan by generating all the 
refinement plans over this node; and 3) agents evaluate 
the resulting successor nodes and communicate the re¬ 
sults to the rest of the agents. Instead of using a broad¬ 
cast control framework, FMAP uses democratic leader¬ 
ship, in which a coordinator role is scheduled among the 
agents. One of the agents adopts the role of coordina¬ 
tor at each iteration, thus leading the procedure in one 
iteration (initially, the coordinator role is randomly as¬ 
signed to one of the participating agents). More specif¬ 
ically, a FMAP iteration is as follows: 

— Base plan selection: Among all the open nodes 
(unexplored leaf nodes) of the multi-agent search 
tree, the coordinator agent selects the most promis¬ 
ing plan, 77^, as the base plan to refine in the current 
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iteration. 77^ is selected according to the evaluation 
of the open nodes (details on the node evaluation 
and selection are presented in section 4.3). In the 
initial iteration, the base plan is the empty plan TIq. 
Refinement plan generation: Agents expand 77^ 
and generate its successor nodes. A successor node 
is a refinement plan over 77^ that an agent generates 
individually through its embedded forward-chaining 
partial-order planner (see subsection 4.1). 
Refinement plan evaluation: Each agent i evalu¬ 
ates its refinement plans T7 r by applying a classical 
A* evaluation function (/(77 r ) = g{view 1 (II r )) + 
h(view l (II r ))). The expression g{view 1 (II r )) stands 
for the number of actions of 77 r . Since agents view 
all the actions of the plans (but not necessarily all 
their preconditions and effects), g(view 1 (II r )) is equiv¬ 
alent to g(II r ). h(view l (II r )) applies our DTG-based 
heuristic (see subsection |4.3| ) to estimate the cost of 
reaching a solution plan from T7 r . 

Refinement plan communication: Each agent 
communicates its refinement plans to the rest of the 
agents. The information that an agent i communi¬ 
cates about its plan 77 r to the rest of the agents 
depends on the level of privacy specified with each 
of them. Along with the refinement plan 77 r , agent 
i communicates the result of the evaluation of 77 r , 

fWr). 


Once the iteration is completed, the leadership is 
handed to another agent, which adopts the coordinator 
role, and a new iteration starts. The next coordinator 
agent selects the open node 77 that minimizes /(77) 
as the new base plan 77^, and then, agents proceed to 
expand it. This iterative process carries on until 77^ 
becomes a solution plan that supports the final action 
<a/, or when all the open nodes have been visited, in 
which case, the agents will have explored the complete 
search space without finding a solution for the MAP 
task TmAP¬ 
Is. refinement plan 77 is evaluated only by the agent 
that generates it. The agent communicates 77 along 
with /(77) to the rest of the agents. Therefore, the de¬ 
cision on the next base plan is not affected by the agent 
that plays the coordinator role since all of the agents 
manage the same /(T7) value for every open node 77. 

In the example depicted in Fig. [3j agent tal evalu¬ 
ates its refinement plans, 77oo and 77oi, and communi¬ 
cates them along with /(T7oo) and /(T7oi) to agents ta2 
and /; likewise, ta2 with tal and /. In this first level 
of the tree, agents tal and ta2 have a complete view 
of the refinement plans, that they have generated since 
these plans only contain an action that they themselves 
introduced. However, when tal and ta2 communicate 
their plans to each other, they will only send the flu- 





Fig. 4 Loading rm in plan TZooL a) inserting actions from 
a frontier state b) inserting actions using FLEX 


ents according to the level of privacy defined between 
them, as described in subsection |3.1[ This way, tal will 
send view ta2 (77 00 ) and view ta2 (77 0 i) to agent £a2, and 
view? (77 0 o) and view? (77qi) to agent /. 

The following subsections analyze the key elements 
of FMAP, that is, the search algorithm that agents 
use for the generation of the refinement plans and the 
heuristic function they use for plan evaluation. We also 
include a subsection that addresses the completeness 
and correctness of the algorithm as well as a subsection 
that describes the limitations of FMAP. 


4.1 Forward-Chaining Partial-Order Planning 

Agents in FMAP use an embedded flexible forward¬ 
chaining POP system to generate the refinement plans; 
this will be referred to as FLEX in the remainder of the 
paper. Similarly to other approaches, FLEX explores 
the potential of forward search to support partial-order 
planning. OPTIC |T, for instance, combines partial- 
order structures with information on the frontier state 
of the plan. Informally speaking, the frontier state of 
the partial plan of a tree node is the resulting state 
after executing the actions in such a plan. Given a re¬ 
finement plan 77 = (A,OH,CC), we define its frontier 
state FS(II ) as the set of fluents (v,d) achieved by 
actions a G A \ (v, d) G EFF(a ), such that any ac¬ 
tion a' G A that modifies the value of the variable v 
((v,d') G EFF(a') \ d ^ d') is not reachable from a by 
following the orderings and causal links in 77. 

The only actions that OPTIC adds to a plan are 
those whose preconditions hold in the frontier state. 
This behaviour forces OPTIC to some early commit¬ 
ments; however, this does not sacrifice completeness, 
because search can backtrack. Also, TFPOP [22] ap¬ 
plies a centralized forward-chaining POP for multiple 
agents, keeping a sequential execution thread per agent. 

The aforementioned approaches only permit intro¬ 
ducing actions that are applicable in the frontier state 
of the plan. In contrast, FLEX allows inserting actions 
at any position of the plan without assuming that any 
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action in the plan has already been executed. This is a 
more flexible approach that is also more compliant with 
the least-commitment principle that typically guides 
backward-chaining POP. Fig. [4] shows the advantages 
of our flexible search strategy. Consider the refinement 
plan ilooi, which is the result of a refinement of agent 
tal on plan iloo (see Fig. [3| after including the action 
{drive tl 11 sf ). This is not the best course of action 
for taking the raw material rm to the factory / as tal 
should load rm into tl before moving to sf. The fron¬ 
tier state FS{IJooi) reflects the state of the world after 
executing the plan TJqoi, in which the truck tl would be 
at sf. Planners like OPTIC would only introduce ac¬ 
tions that are applicable in the frontier state FS (L^ooi)- 
In this example, OPTIC would first insert the action 
{drive tl sf 12) to move the truck tl back to 12 in or¬ 
der to be able to apply the action {load tl rm 12) (see 
Fig. [4^). FLEX, however, is able to introduce actions 
at any position in the plan, so the load action can be 
directly placed between both drive actions, thus mini¬ 
mizing the length of the plan (see Fig. B>). 


Algorithm 1: FLEX search algorithm for an 
agent i 
RP l <- 0 

if potentiallySupportable{af, view 7, {lib)) then 
L return solutionPlans 
CandidateActions <— 0 

forall the a E A 7, do 

if potentiallySupportable{a, view 7, {IIb)) then 
\_C andidate Actions <— Candidate Actions U a 

forall the a E C andidate Actions do 
Plans <— {view 7, {lib)} 

repeat 

Select and extract 77 s E Plans 
Flaws{II s ) «— 

unsupportedPrecs{a , 77 s ) U T hr eats (77 s ) 
if Flaws{II s ) = 0 then 
[_ RP l <- RP l Ui7 s 

else 

Select and extract <P E Flaws{II s ) 

Plans <— Plans U solveFlaw{II s , <P) 

until Plans = 0 

return RP % 


Algorithm [l] summarizes the FLEX procedure in¬ 
voked by an agent i to generate refinement plans, and 
Fig.[5]shows how agent tal in Example [l] uses the FLEX 
algorithm to refine plan iloo in Fig. [3j The first oper¬ 
ation of an agent i that executes FLEX is to check 
whether the fictitious final action af is supportable in 
77b, that is, if a solution plan can be obtained from lib. 
If so, the agent will generate a set of solution plans that 


Base plan 



Potentially 

Supportable 

Actions 

Estimate 


Candidate Actions c A 


Independent POP J 
trees for each action \ 



19 09 00 00 0 0 


[^000 o_ ^00 iO ^002 o: 


Leaf nodes 
(Refinement plans) 


Fig. 5 FLEX algorithm as applied by agent tal over plan 
IIoo 


covers all the possible ways to support the preconditions 
of af through causal links. 

If a solution plan is not found, agent i analyzes 
all its planning actions A 1 and estimates if they are 
supportable in lib. Given an action a E A\ the func¬ 
tion potentiallySupportable{a, lib) checks if V(v,d) E 
PRE{a ), 3/3 E A{n b ) \ {v = d) E EFF{(3 ), i.e., the 
agent estimates that a is supportable if for every pre¬ 
condition of a there is a matching effect among the 
actions of 

Fig.i shows an example of potentially supportable 
actions. Agent tal evaluates all the actions in A ta and 
finds five candidate actions. In c^, the initial state of 
il 00 , the truck tl is at location 11. Consequently, tal 
considers {drive tl 11 sf) and {drive tl 11 12) as poten¬ 
tial candidate actions for its refinements. Note that ac¬ 
tion {drive tl 11 12) is already included in plan iloo- Ac¬ 
tions {drive tl 12 sf ), {drive tl 12 11 ), and {load tl rm 12) 
are also classified as candidates since they are applica¬ 
ble after the action {drive tl 11 Z2), which is already in 
plan 77 00 . 

It is possible to introduce an action multiple times 
in a plan; for instance, a truck may need to travel back 
and forth between two different locations several times. 
For this reason, tal again considers {drive tl 11 12) as a 
candidate action when refining iloo, even if this action 
is already included in iloo- By estimating potentially 
supportable actions in any position of the plan, FLEX 
follows the least commitment principle and does not 
leave out any potential refinement plan. 

The potentially Supportable procedure is an esti¬ 
mate because it does not actually check the possible 
flaws that arise when supporting an action. Hence, an 
agent analyzes the alternatives that support each candi¬ 
date action a by generating a POP search tree for that 
particular action {repeat loop in Algorithm [l]) . All the 
leaf nodes of the tree (stored in the Plans list in Algo¬ 
rithm [l]) are explored, thereby covering all the possible 
ways to introduce a in lib. 
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As in backward-chaining POP, FLEX introduces the 
action a in 77& by supporting its preconditions through 
causal links and solving the threats that arise during 
the search. The set of flaw-free plans obtained from this 
search are stored in RP l as valid refinement plans of 
agent i over This procedure is carried out for each 
candidate action. Completeness is guaranteed since all 
the possible refinement plans over a given base plan are 
generated by the agents involved in Tmap • 

Fig.0 shows that, for every candidate action, tal 
performs an independent POP search aimed at support¬ 
ing the action. Actions ( load tl rm 12 ), (drive tl 12 sf), 
and ( drive tl 12 11) lead to three different refinement 
plans over IIoo’ {77oocb 77ooi >^ 002 }- These plans will 
then be inserted into taVs copy of the multi-agent search 
tree. Agent tal will also send the information of these 
plans to agents ta2 and / according to the level of pri¬ 
vacy defined with each one. ta2 and / also store the 
received plans in their copies of the tree. 

Candidate action ( drive tl 11 sf) does not produce 
valid refinement plans because it causes an unsolvable 
threat. This is because truck tl cannot simultaneously 
move to two different locations from /1, which causes a 
conflict between the existing action ( drive tl 11 12) E 
Z\(77oo) and ( drive tl 11 sf). Similarly, action ( drive tl 11 
does not yield any valid refinements. The resulting plan 
would have two actions ( drive tl 11 12) in parallel, both 
of which are linked to c^, which causes an unsolvable 
threat because tl cannot perform two identical drive 
actions in parallel. 

4.2 Completeness and Soundness 

As explained in the previous section, agents refine the 
base plan concurrently by analyzing all of the possible 
ways to support their actions in the base plan. Since 
this operation is done by every agent and for all their 
actions, we can conclude FMAP is a complete proce¬ 
dure that explores the whole search space. 

As for soundness, a partial-order plan is sound if it is 
a flaw-free plan. The FLEX algorithm addresses incon¬ 
sistencies among actions in a partial plan by detecting 
and solving threats. 

When an agent i introduces an action a in a base 
plan 77, FLEX studies the threats that a causes in the 
causal links of 77 and the threats that the actions of 77 
may cause in the causal links that support the precon¬ 
ditions oi a. In both cases, i is able to detect all threats 
whatever its view of the plan is, view 1 (77). That is, 
FMAP soundness is guaranteed regardless of the level 
of privacy defined between agents. 

With regard to the threats caused by the effects of a 
new action, privacy may prevent the agent from viewing 


some of the causal links of the plan. Suppose that agent 
i introduces an action a t with an effect (v = d') in plan 
77. Additionally, there is a causal link in 77 of the form 

cl = ao aq introduced by an agent j; as cl is not 
ordered with respect to a t , this situation generates a 
threat. According to view 1 (77), agent i may find one of 
the following situations: 

— If (v,d) is public to i and j, then cl is in view l (II ), 
and thus the threat between cl and a t will be cor¬ 
rectly detected and solved by promoting or demot¬ 
ing a t . 

— If (v,d) is private to j w.r.t. i, then a t cannot con¬ 
tain an effect (v = d') because v 0 VL Therefore, 
the threat described above can never occur in 77. 

— If (v,d) is partially private to j w.r.t. i, then cl = 

{v,d) (v,±) 

ao -A aq will be seen as cl = ao -A oq m 
view 1 [II). Since _L^ <7, agent i will be able to detect 
and address the threat between a t and cl. 

Consequently, an agent can always detect the arising 
threats when it adds a new action, a t , in the plan. Now, 
we should study whether the potential threats caused 
by actions in 77 on the causal links that support the 
-action a t are correctly detected by agent i. Suppose 

^ (v' e) 

that there is a causal link cl' = /3 -4 a t , and an 

action 7 with an effect (V = e!) which is not ordered 
with respect to a t . Again, agent i may find itself in three 
different scenarios according to its view of ( v' = e')\ 

— If ( v' = e') is public to i and j, the threat between 
cl' and 7 will be correctly detected by i. 

— If ( v' = e') is private to j w.r.t. i, then none of 
the variables in PRE(a t ) are related to v' because 
v' ^ VL Thus, this threat will never arise in 77. 

— If (V = e!) is partially private to j w.r.t. i, (V = e!) 
will be seen as (v' =_L) in view 1 (77). Since _L^ e, the 
threat between 7 and cl' will be correctly detected 
by agent i. 

Note that privacy does not prevent agents from de¬ 
tecting and solving threats nor does it affect the com¬ 
plexity of the process. If the fluent is public or partially 
private, the agent that is refining the plan will be able 
to detect the threat because it either sees the value of 
the variable or sees _L, and both contradict the value of 
the variable in the causal link. If the fluent is private, 
then there is no such threat. This proves that FMAP is 
sound. 

4.3 DTG-based Heuristic Function 

The last aspect of FMAP to analyze is how agents eval¬ 
uate the refinement plans. FMAP guides the search 
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Fig. 6 Reduced transport example task 



Fig. 7 Centralized and distributed DTG of the variable 
( pos-rm} 

through a domain-independent heuristic function, as 
most planning systems do m ■ It uses the information 
provided by the frontier states to perform the heuristic 
evaluation of the plans contained in the tree nodes. 

According to the definition shown in section |4.1[ 
the frontier state of a plan 77, FS(II ), can be easily 
computed as the finite set of fluents that results from 
executing the actions of the plan 77 in X, the initial 
state of Tmap • Since refinement plans are not sequential 
plans, the actions in A have to be linearized in order 
to compute the frontier state. The linearization of a 
refinement plan 77 involves establishing a total order 
among the actions in A. Given two actions a G A and 
(3 G A, if a -< P G OIZ or /? -< a G OIZ , we keep this 
ordering constraint in the linearized plan. If a and (3 
are non-sequential actions, we establish a total ordering 
among them. Since plans returned by FLEX are free of 
conflicts, it is irrelevant how non-sequential actions are 
ordered. 

Frontier states allow us to make use of state-based 
heuristics such as Jiff, the relaxed planning graph (RPG) 
heuristic of FF m- However, the distributed approach 
and the privacy model of FMAP makes the applica¬ 
tion of hpF inadequate to guide the search. Since none 
of the agents has knowledge that is complete enough 
to build an RPG by itself, using hpF to estimate the 
quality of a refinement plan involves agents building 
a distributed RPG [42] . This is a costly process that 
requires many communications among agents to coor¬ 
dinate which each other, and it has to be repeated for 
the evaluation of each refinement plan. Therefore, the 


predictable high computational cost of the application 
of hpF led us to discard this choice and opt for de¬ 
signing a heuristic that is based on Domain Transition 
Graphs (DTGs) [ 14] . 

A DTG is a directed graph that shows the ways in 
which a variable can change its value [14] . Each transi¬ 
tion is labeled with the necessary conditions for this to 
happen; i.e., the preconditions that are common to ah 
the actions that induce the transition. Since DTGs are 
independent of the state of the plan, recalculations are 
avoided during the planning process. 

Privacy is kept in DTGs through the use of the un¬ 
defined value _L. This value is represented in a DTG like 
the rest of the values of the variables, the only differ¬ 
ence being that transitions from/to _L are labeled with 
the agents that induce them. 

Consider a reduced version of Example [l] that is de¬ 
picted in Fig. [6] In this example, both transport agents 
tal and ta2 can use truck tl within their geographi¬ 
cal areas gal and ga2 , respectively. Fig. [7] shows the 
DTG of the variable (pos-rm). In a single-agent task 
(upper diagram) ah the information is available in the 
DTG. However, in the multi-agent task (bottom dia¬ 
grams), agent tal does not know the location of rm if 
ta2 transports it to /, while ta2 does not know the ini¬ 
tial placement of rm, since location /I lies outside ta2’s 
geographical area, ga2. In order to evaluate the cost 
of achieving ( pos-rm , /) from the initial state, tal will 
first check its DTG, thus obtaining the cost of loading 
rm in tl. As shown in Fig. [7] the transition between val¬ 
ues tl and _L is labeled with agent ta2. Therefore, tal 
will ask ta2 for the cost of the path between values tl 
and / to complete the calculation. Communications are 
required to evaluate multi-agent plans, but DTGs are 
more efficient than RPGs because they remain constant 
during planning, so agents can minimize the overhead 
by memorizing paths and distances between values. 

For a given plan 77, our DTG-based heuristic func¬ 
tion (Iijotg in the following) returns the number of ac¬ 
tions of a relaxed plan between the frontier state FS(II) 
and the set of goals of Tmap , G- hDTG performs a back¬ 
ward search introducing the actions that support the 
goals in Q into the relaxed plan until ah their precondi¬ 
tions are supported. Hence, the underlying principle of 
Hdtg is similar to Hff , except for the fact that DTGs 
are used instead of RPGs to build the relaxed plan. 

The Hdtg evaluation of a plan 77 begins by cal¬ 
culating the frontier state FS(II). Next, an iterative 
procedure is performed to build the relaxed plan. This 
procedure manages a list of fluents, openGoals , initially 
set to Q. The process iteratively extracts a fluent from 
openGoals and supports it through the introduction of 
an action in the relaxed plan. The preconditions of such 
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an action are then included in the openGoals list. For 
each variable v G V, the procedure manages a list of 
values, Values v , which is initialized to the value of v 
in the frontier state FS(IJ). For each action added to 
the relaxed plan that has an effect (v = d'), d! will be 
stored in Values v . An iteration of the Jidtg evaluation 
process executes the following stages: 

— Open goal selection: From the openGoals set, the 
procedure extracts the fluent (v,d g ) G openG that 
requires the largest number of value transitions to 
be supported. 

— DTG path computation: For every value do in 
Values v , this stage calculates the shortest sequence 
of value transitions in i/s DTG from do to d g . Each 
path is computed by applying Dijkstra’s algorithm 
between the nodes do and d g in the DTG associated 
to variable v. The path with the minimum length is 
stored as minPath = ((do, di), (di, d 2 ),..., (d p _i, d g )). 

— Relaxed plan construction: For each value tran¬ 
sition (d^d^+i) G minPath , the minimum-cost ac¬ 
tion otjnin that produces such a transition is intro¬ 
duced in the relaxed plan; that is, (v , di) G PRE(a m i n ) 
and (v = di+i) G EFF(o m ^ n ). The cost of an action 

is computed as the sum of the minimum number of 
value transitions required to support its precondi¬ 
tions. The unsupported preconditions of ow n are 
stored in openGoals , so they will be supported in 
the subsequent iterations. For each effect (v f = d') G 
EFF^arnin), the value d! is stored in Values v > , so 
d r can be used in the following iterations to support 
other openGoals. 

The iterative evaluation procedure carries on un¬ 
til all the open goals have been supported, that is, 
openGoals = 0, and Kdtg returns the number of ac¬ 
tions in the relaxed plan. 


4.4 Limitations of FMAP 

In this section, we present some limitations of FMAP 
that are worth discussing. FMAP builds upon the POP 
paradigm, so it can handle plans with parallel actions 
and only enforces an ordering when strictly necessary. 
FMAP, however, does not yet explicitly manage time 
constraints nor durative actions. A POP-based planner 
can easily be extended to incorporate time because the 
application of the least-commitment principle provides 
a high degree of execution flexibility. Additionally, POP 
is independent of the assumption that actions must be 
instantaneous or have the same duration and allows ac¬ 
tions of arbitrary duration and different types of tem¬ 
poral constraints to be defined as long as the conditions 


under which actions interfere are well defined [53] . In 
short, POP represents a natural and very appropriate 
way to include and handle time in a planning frame¬ 
work. 

FLEX involves the construction of a POP tree for 
each potentially supportable action (see Fig. [5|. This 
procedure is more costly than the operations required 
by a standard planner to refine a plan. However, the 
search trees are independent of each other, which makes 
it possible to implement FLEX by using multiple execu¬ 
tion threads. Parallelization improves the performance 
of FLEX and the ability of FMAP to scale up. Section 
[5] provides more insight into the FLEX implementation. 

Currently, FMAP is limited to cooperative goals, 
which means that all the goals are defined as global 
objectives to all the participating agents (see section 
[5]). Nevertheless, as a future work, we are considering 
an extension of FMAP to support self-interested agents 
with local goals. 

FMAP is a general procedure aimed at solving any 
kind of MAP task. In particular, solving tightly-coupled 
tasks requires a great amount of coordination. Multi¬ 
agent coordination in distributed systems where agents 
must cooperate is always a major issue. This depen¬ 
dency on coordination makes FMAP a communication- 
reliant approach. Agents not only have to communicate 
the refinement plans that they build at each iteration, 
but they also have to communicate during the heuristic 
evaluation of the refinement plans in order to maintain 
privacy (see subsection 4.3). The usage of a coordinator 
agent effectively reduces the need for communication. 
The experimental results will show that FMAP can ef¬ 
fectively tackle large problem instances (see section [ 5 ]). 
Nevertheless, reducing communication overhead while 
keeping the ability to solve any kind of task remains 
an ongoing research topic that we plan to consider for 
future developments. 

Privacy management is another issue that poten¬ 


tially worsens the performance of FMAP. In section 3.1 


we defined a mechanism to detect and address threats 
in partial plans, even when agents do not have a com¬ 
plete view of such plans. Privacy does not add extra 
complexity to FLEX since agents manage the undefined 
value _L as any other value in the domain of a variable. 
It does, however, make the refinement-plan communi¬ 
cation stage more complex because, when an agent i 
sends view^ (77) to an agent j, this implies that i must 
previously adapt the information of FI according to the 
privacy rules defined w.r.t. to j. 

Privacy also affects the heuristic evaluation of the 
plans in terms of quality. Since a refinement plan is 
only evaluated by the agent that generates it and this 
evaluation is influenced by the agent’s view of the plan, 
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the result may not be as accurate as if the agent had had 
a complete view of the plan. Empirical results, however, 
will show that, even with these limitations, our heuristic 
function provides good performance in a wide variety 
of planning domains (see section [5|. 

5 Experimental results 

In order to assess the performance of FMAP, we ran ex¬ 
perimental tests with some of the benchmark problems 
from the International Planning Competition^] (IPC). 
More precisely, we adapted the STRIPS problem suites 
of 10 different domains from the latest IPC editions to 
a MAP context. The tests compare FMAP with two 
different state-of-the-art MAP systems: MAPR [2] and 
MAP-POP [37]. We excluded Planning First [25] from 
the comparison because it is outperformed by MAP- 
POP [37]. 

This section is organized as follows: first, we pro¬ 
vide some information on the FMAP implementation 
and experimental setup. Then, we present the features 
of the tested domains and we analyze the MAP adap¬ 
tation performed for each domain. Next, we show a 
comparative analysis between FMAP and the afore¬ 
mentioned planners, MAPR [2] and MAP-POP [37] . 
Then, we perform a scalability analysis of FMAP and 
MAPR. Finally, we summarize and discuss the results 
obtained by FMAP and how they compare to the other 
two planners. 


5.1 FMAP implementation and experimental setup 

Most multi-agent applications nowadays make use of 
middleware multi-agent platforms that provide them 
with the communication services required by the agents 
[27]. The entire code of FMAP is implemented in Java 
and builds upon the Magentix2 platform^] [35] . Magen- 
tix2 provides a set of libraries to define the agents’ 
behavior, along with the communication resources re¬ 
quired by the agents. Magentix2 agents communicate 
by means of the FIPA Agent Communication Language 
[26] . Messaging is carried out through the Apache QPid 
brokei^] which is a critical component for FMAP agents. 

FMAP is optimized to take full advantage of the 
CPU execution threads. The FLEX procedure, which 
generates refinement plans over a given base plan, de¬ 
velops a POP search tree for each potentially support¬ 
able action of the agent’s domain. As the POP trees are 

2 http://ipc.icaps-conference.org/ 

3 http://www.gti-ia.upv.es/sma/tools/magentix2 

4 http://qpid.apache.org/ 


completely independent from each other, the processes 
for building the trees run in parallel for each agent. 

Agents synchronize their activities at the end of the 
refinement plan generation stage. Consequently, FMAP 
assigns the same number of execution threads to each 
agent so that they all spend a similar amount of time 
to complete the FLEX procedure (note that if we al¬ 
locate extra threads to a subset of the agents, they 
would still have to wait for the slowest agent to synchro¬ 
nize). FLEX builds as many POP search trees in par¬ 
allel as execution threads agents have been allocated. 
The Kdtg heuristic is implemented in a similar way. 
An agent can simultaneously evaluate as many plans as 
execution threads it has been allocated. 

All the experimental tests were performed on a sin¬ 
gle machine with a quad-core Intel Core i7 processor 
and 8 GB RAM (1.5 GB RAM available for the Java 
VM). The CPU used in the experimentation has eight 
available execution threads, which are distributed as 
follows: in tasks that involve two agents, FMAP allo¬ 
cates four execution threads per agent; in tasks with 
three or four agents, each agent has two available ex¬ 
ecution threads; finally, in tasks involving five or more 
agents, each agent has a single execution thread at its 
disposal. For instance, the three agents in Example [l] 
would get two different execution threads in this par¬ 
ticular machine. Hence, in the FLEX example depicted 
in Fig. [5] agent tal would be able to study two candi¬ 
date actions simultaneously, thus reducing the execu¬ 
tion time of the overall procedure. 


5.2 Planning domain taxonomy 

The benchmark used for the experiments includes 10 
different domains of the IPCs that are suitable for a 
multi-agent adaptation. The IPC benchmarks come from 
(potential) real-world applications of planning, and they 
have become the de facto mechanism for assessing the 
performance of single-agent planning systems. The ele¬ 
vators domain, for instance, is inspired by a real prob¬ 
lem of Schindler Lifts Ltd. m, the satellite domain is 
motivated by a NASA space application [24]; the rovers 
domain deals with the decision of daily planning activ¬ 
ities of Mars rovers 0; and the openstacks domain is 
based on the minimum maximum simultaneous open 
stacks combinatorial optimization problem. Hence, all 
the domains from the IPCs resemble practical scenarios 
and they are modeled to keep, as much as possible, both 
their structure and complexity. In MAP, there is not a 
standardized collection of planning domains available. 
Instead, MAP approaches adapt some well-known IPC 
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Domain 

Typology 

IPC 

Agents 

Cooperative goals 

Applicability 

MAPR 

FMAP 

MAP-POP 

Blocksworld 

Loosely-coupled 

’98 

robot 

No 

/ 

/ 

Driverlog 

Loosely-coupled 

’02 

driver 

No 

/ 

/ 

Rovers 

Loosely-coupled 

’06 

rover 

No 

/ 

/ 

Satellite 

Loosely-coupled 

’04 

satellite 

No 

/ 

/ 

Zenotravel 

Loosely-coupled 

’02 

aircraft 

No 

/ 

/ 

Depots 

Tightly-coupled 

’02 

depot/truck 

Yes 

X 

/ 

Elevators 

Tightly-coupled 

’ll 

fast-elevator / slow-elevator 

Yes 

X 

/ 

Logistics 

Tightly-coupled 

’00 

airplane/truck 

Yes 

X 

/ 

Openstacks 

Tightly-coupled 

’ll 

manager / manufacturer 

Yes 

X 

/ 

Woodworking 

Tightly-coupled 

T1 

machine 

Yes 

X 

/ 


Table 1 Features of the MAP domains 


domains to a multi-agent context, namely the satellite , 
rovers , and logistics domains [2 1251137] . 

Converting planning domains into a multi-agent ver¬ 
sion is not always possible due to the domain charac¬ 
teristics. While some IPC domains have a straightfor¬ 
ward multi-agent decomposition, others are inherently 
single-agent. We developed a domain-dependent tool to 
automatically translate the original STRIPS tasks into 
our PDDL- based MAP language. 

The columns in Table [l] describe the main features 
of the 10 MAP domains that are included in the bench¬ 
mark. Typology indicates whether the MAP tasks of 
the domain are loosely-coupled or tightly-coupled. IPC 
shows the last edition of the IPC in which the domain 
was included. Agents indicates the types of object used 
to define the agents. Cooperative goals indicates the 
presence or absence of these goals in the tasks of each 
domain. Finally, Applicability shows the MAP systems 
that are capable of coping with each domain. 

In order to come up with a well-balanced bench¬ 
mark, we put the emphasis on the presence (or absence) 
of specialized agents and cooperative goals. Besides the 
adaptation to a multi-agent context, the 10 selected 
domains are a good representative sample of loosely- 
coupled domains with non-specialized agents and tightly- 
coupled domains with cooperative goals. 

Privacy in each domain is defined according to the 
nature of the problem and the type of agents involved, 
while maintaining a correlation and identification with 
the objects in a real-world problem. 

5.2.1 Loosely-coupled domains 

The five loosely-coupled domains presented in Table [l] 
are: Blocksworld , Driverlog , Rovers , Satellite , and Zeno- 
travel. The prime characteristic of these domains is that 
agents have the same planning capabilities (operators) 
such that each task goal can be individually solved by 
a single agent. That is, tasks can be addressed without 


cooperation among agents. Next, we provide some in¬ 
sight into the features of these domains and the MAP 
adaptations. 

Satellite [24] . This domain offers a straightforward 
multi-agent decomposition [251137] . The MAP domain 
features one agent per satellite. The resulting MAP 
tasks are almost decoupled as each satellite can attain 
a subset of the task goals (even all the goals in some 
cases) without interacting with any other agent. The 
number of agents in the tasks of this domain vary from 
1 to 12. The location, orientation, and instruments of 
a satellite are private to the agent, only the informa¬ 
tion on the images taken by the satellites is defined as 
public. 

Rovers m- Like the Satellite domain, Rovers also 
offers a straightforward decomposition [251137]. The MAP 
domain features one agent per rover. Rovers collect 
samples of soil and rock and hardly interact with each 
other except when a soil or rock sample is collected by 
an agent, and so it is no longer available to the rest 
of the agents. The number of agents ranges from 1 to 
8 rovers per task. As in the Satellite domain, only the 
information related to the collected samples is defined 
as public. 

Blocksworld. The MAP version of this domain intro¬ 
duces a set of robot agents (four agents per task), each 
having an arm to arrange blocks. Unlike the original do¬ 
main, the MAP version of Blocksworld allows handling 
more than one block at a time. All the information in 
this domain is considered to be public. 

Driverlog [24] . In this MAP domain, the agents are 
the drivers of the problem, ranging between 2 and 8 
agents per task. Driver agents are in charge of driving 
the available trucks and delivering the packages to the 
different destinations. All the information in the do¬ 
main (status of drivers, trucks, and packages) is publi¬ 
cized by the driver agents. 

Zenotravel m ■ This domain defines one agent per 
aircraft. The simplest tasks include one agent and the 
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most complex ones up to five agents. Aircraft can di¬ 
rectly transport passengers to any city in the task. As 
in the Blocksworld and Driverlog domains, all the infor¬ 
mation concerning the situation of the passengers and 
the current location of each aircraft is publicly available 
to all the participating agents. 

5.2.2 Tightly-coupled domains 

We also analyzed five additional domains that feature 
specialized agents with different planning capabilities: 
Depots , Elevators , Logistics , Openstacks and Woodwork¬ 
ing. The features of these domains give rise to complex, 
tightly-coupled tasks that require interactions or com¬ 
mitments m among agents in order to be solved. 

Depots [24] . This domain includes two different types 
of specialized agents, depots and trucks, that must co¬ 
operate in order to solve most of the goals of the tasks. 
This domain, which is the most complex one in our 
MAP benchmark, leads to tightly-coupled MAP tasks 
with many dependences among agents. Depots tasks 
contain a large number of participating agents, rang¬ 
ing from 5 to 12 agents. Only the location of packages 
and trucks is defined as public information. 

Elevators. Each agent in this domain can be a slow- 
elevator or a fast-elevator. Operators in the STRIPS 
domain are basically the same for both types of eleva¬ 
tors since the differences between them only affect the 
action costs. Elevator agents, however, are still special¬ 
ized because the floors they can access are limited. This 
leads to tasks that require cooperation to fulfill some of 
the goals. For instance, an elevator may not be able to 
take a passenger to a certain floor, so it will stop at 
an intermediate floor so that the passenger can board 
another elevator that goes to that floor. Tasks include 
from 3 to 5 agents. Agents share the information re¬ 
garding the location of the different passengers. 

Logistics. This domain presents two different types 
of specialized agents: airplanes and trucks. The delivery 
of some of the packages involves the cooperation of sev¬ 
eral truck and airplane agents (similarly to the example 
task introduced in this article). Tasks feature from 3 to 
10 different agents. Information regarding the position 
of the packages is defined as public. 

Openstacks HU. This MAP domain includes two 
specialized agents in all of the tasks; the manager is 
in charge of handling the orders, and the manufacturer 
controls the different stacks and manufactures the prod¬ 
ucts. Both agents depend on each other to perform their 
activities, thus resulting in tightly-coupled MAP tasks 
with inherently cooperative goals. Most of the infor¬ 
mation regarding the different orders and products is 


public since both agents need it to interact with each 
other. 

Woodworking. This domain features four different 
types of specialized agents (a planer, a saw, a grinder 
and a varnisher) that represent the machines in a pro¬ 
duction chain. In most cases, the output of one machine 
constitutes the input of the following one, so Wood¬ 
working agents have to cooperate to fulfill the different 
goals. All the tasks include four agents (a machine of 
each type). All the information on the status of the dif¬ 
ferent wood pieces is publicized since agents require this 
information in order to operate. 


5.3 FMAP vs. MAPR comparison 

This subsection compares the experimental results of 
FMAP and MAPR [2]. MAPR is implemented in Lisp 
and uses LAMA [29 as the underlying planning system, 
without using a middleware platform for multi-agent 
systems. Each experiment is limited to 30 minutes. 

Table [2] shows the comparative results for FMAP 
and MAPR. The Solved columns refer to the number 
of tasks solved by each approach. The average num¬ 
ber of actions, makespan (plan duration), and search 
time consider only the tasks solved by both FMAP 
and MAPR (the Common column shows the number 
of tasks solved by both planners). Actions, makespan, 
and time values in MAPR are relative to the results ob¬ 
tained with FMAP. The values nx in Table [3] indicate 
”n times as much as the FMAP result”. Therefore, an 
Actions or Makespan value that is higher than lx is a 
better result for FMAP and a value lower than lx is a 
worse result for FMAP. However, a Time value higher 
than lx indicates a better result for FMAP. 

Of the most recent MAP systems, MAPR is one 
that offers excellent performance in comparison to other 
state-of-the-art MAP approaches (2]. However, as re¬ 
flected in Table [l] MAPR is only compatible with the 
loosely-coupled domains in the benchmark. This limita¬ 
tion is due to the planning approach of MAPR. Specif¬ 
ically, MAPR applies a goal allocation procedure, de¬ 
composing the MAP task into subtasks and giving each 
agent a subset of the task goals to solve. Each agent 
subtask is solved with the single-agent planner LAMA 
[29] such that the resulting subplans are progressively 
combined into a global solution. This makes MAPR an 
incomplete planning approach that is limited to loosely- 
coupled tasks without cooperative goals. That is, MAPR 
is built under the assumption that each goal must be 
addressed by at least one of the agents in isolation [2]. 

Whereas the communication overhead is relatively 
high in FMAP (to a large extent, this is due to the use 
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Domain 

Tasks 

Common 

FMAP 

MAPR 

Solved 

Actions 

Makespan 

Time 

Solved 

Actions 

Makespan 

Time 

Blocksworld 

34 

19 

19 

17,79 

13,68 

86,17 

34 

l,27x 

l,20x 

0,04x 

Driverlog 

20 

15 

15 

24,64 

13,93 

42,02 

18 

l,19x 

l,53x 

0,06x 

Rovers 

20 

19 

19 

32,63 

14,95 

53,82 

20 

0,97x 

0,85x 

0,05x 

Satellite 

20 

15 

16 

27,27 

16,47 

177,65 

18 

l,14x 

l,03x 

0,03x 

Zenotravel 

20 

18 

18 

25,50 

13,94 

180,62 

20 

l,24x 

l,32x 

0,02x 


Table 2 Comparison between FMAP and MAPR 


of the Magentix MAS platform), agents in MAPR do 
not need to communicate during the plan construction 
because each agent addresses its allocated subgoals by 
itself. This setup has a rather positive impact on the ex¬ 
ecution times and the number of problems solved (cov¬ 
erage). As expected, Table[2]shows that execution times 
in MAPR are much lower than FMAP. With respect to 
coverage, MAPR solves 110 out of 114 loosely-coupled 
tasks (roughly 96% of the tasks), while FMAP solves 
87 of such tasks (76%). 

However, in most domains, FMAP comes up with 
better quality plans than MAPR, taking into account 
the number of actions as well as the makespan. MAPR 
is limited by the order in which agents solve their sub¬ 
tasks. The first agent that computes a subplan cannot 
take advantage of the potential synergies that may arise 
from other agents’ actions; the second agent has only 
the information of the first agent’s subplan, and so on. 
Additionally, the allocation of goals to each agent may 
lead to poorly balanced plans. Although FMAP is a 
more time-consuming approach, it avoids these limita¬ 
tions because agents work together to build the plan 
action by action. Thus, FMAP provides agents with a 
global view of the plan at each point of the construction 
process, while agents in MAPR keep a local view of the 
plan at hand. 

The Driverlog domain, while being loosely-coupled, 
offers many possible synergies between agents. For in¬ 
stance, a driver agent can use a truck to travel to its 
destination and load a package on its way, while an¬ 
other agent may take over the truck and complete the 
delivery. If the first agent acted in isolation, it would 
deliver the package and then go back to its destination, 
which would result in a worse plan. Robot agents in 
the Blocksworld domain can also cooperate to improve 
the quality of the plans: for instance, a robot can pick 
up a block so that another robot can retrieve the block 
below. Goal balance is also a key aspect in Zenotravel 
since aircraft agents have limited autonomy. If an air¬ 
craft solves too many goals it may be forced to refuel 
thereby worsening the plan quality. 

Fig. [8] illustrates the MAPR limitations by show¬ 
ing the solution plans obtained by both approaches for 
task 8 of the Zenotravel domain. The goals of this task 


involve transporting three different people and flying 
plane 1 to city3. The first three goals are achievable 
by all the plane agents, but the last goal can only be 
completed by agent plane 1. 

MAPR starts with agent plane3, which solves all of 
the goals that it can. Then, plane 1 receives the subplan 
and completes it by solving the remaining goal. The 
resulting joint plan is far from the optimal solution. 
Agent plane3 requires 10 time units to solve its subplan 
because it transports all of the passengers. The high 
number of fly actions forces the agent to introduce 
additional actions to refuel its tank. On the other hand, 
agent plane 1 flies directly to its destination without 
transporting any passengers. 

In contrast, agents in FMAP progressively build the 
solution plan together without using an a-priori goal al¬ 
location, which allows them to obtain much better qual¬ 
ity plans, taking advantage of synergies between actions 
of different agents and effectively balancing the work¬ 
load among agents. Fig. [8] shows that, in FMAP, agent 
plane 1 transports person6 to its destination, thus sim¬ 
plifying the activities of plane3, which avoids refueling. 
The resulting plan is a much shorter and better bal¬ 
anced solution than the MAPR plan (only 6 time steps 
versus 10 time steps in MAPR) and it requires fewer 
actions (13 actions versus 16 in MAPR). 

Table [2] shows that FMAP noticeably improves plan 
quality except in the most decoupled domains, namely 
Rovers and Satellite (in the latter, FMAP results are 
slightly better than MAPR results). In these domains, 
synergies among agents are minimal or even nonex¬ 
istent. Consequently, MAPR is not penalized by its 
search scheme, obtaining plans of similar quality to 
FMAP. 

5.4 FMAP vs. MAP-POP comparison 

We compared FMAP with another recent MAP sys¬ 
tem, MAP-POP [37] . Like FMAP, MAP-POP agents 
explore the space of multi-agent plans jointly. This set¬ 
up allows MAP-POP to overcome some of the lim¬ 
itations of MAPR since it is able to tackle tightly- 
coupled tasks with cooperative goals. However, MAP- 
POP has two major disadvantages. Much like MAPR, 
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Fig. 8 Zenotravel task 8 solution plan as obtained by FMAP (upper plan) and MAPR (lower plan) 


Domain 

Tasks 

Common 

FMAP 

MAP-POP 

Solved 

Actions 

Makespan 

Time 

Solved 

Actions 

Makespan 

Time 

Blocksworld 

34 

6 

19 

9,20 

7,80 

7,57 

6 

0,91x 

0,74x 

21,49x 

Driverlog 

20 

2 

15 

9,50 

7,00 

0,66 

2 

1,1 lx 

l,00x 

949,39x 

Rovers 

20 

6 

19 

32,63 

14,95 

53,82 

6 

l,01x 

l,04x 

29,27x 

Satellite 

20 

7 

16 

17,14 

12,57 

16,00 

7 

l,03x 

0,89x 

0,37x 

Zenotravel 

20 

3 

18 

7,67 

4,33 

1,25 

3 

l,00x 

l,00x 

87,54x 

Depots 

20 

1 

6 

14,00 

9,00 

10,56 

1 

0,86x 

l,00x 

2,77x 

Elevators 

30 

22 

30 

21,32 

11,36 

14,60 

22 

l,04x 

l,37x 

14,23x 

Logistics 

20 

7 

10 

32,29 

12,71 

18,26 

7 

0,97x 

0,91x 

5,89x 

Openstacks 

30 

0 

23 

53,13 

41,78 

268,62 

0 

- 

- 

- 

Woodworking 

30 

0 

22 

16,50 

4,45 

100,88 

0 

- 

- 

- 


Table 3 Comparison between FMAP and MAP-POP 


MAP-POP is an incomplete approach because it implic¬ 
itly bounds the search tree by limiting its branching fac¬ 
tor. This may prevent agents from generating potential 
solution plans [37] . Additionally, MAP-POP is based 
on backward-chaining POP technologies, thus relying 
on heuristics that offer a rather poor performance in 
most MAP domains. 

Table [3] shows the comparison between FMAP and 
MAP-POP. As in Table [2j the average results consider 
only the tasks solved by both approaches (the FMAP 
results for Openstacks and Woodworking domains in¬ 
clude all the tasks solved by this approach because 
MAP-POP does not solve any of the tasks). The fig¬ 
ures in FMAP show the results obtained using FMAP 
for the common problems; MAP-POP values are rela¬ 
tive to the results of FMAP. 

In general, FMAP results are better than MAP- 
POP results in almost every aspect. In terms of cover¬ 
age, FMAP clearly outperforms MAP-POP, solving 178 
out of 244 tasks (roughly 73% of the tasks in the bench¬ 
mark), while MAP-POP solves only 54 tasks (22%). 
Overall, in MAP-POP there are problems with some 
of the most complex tightly-coupled domains (specif¬ 
ically, Depots , Openstacks , and Woodworking ), but it 


performs well in the Elevators domain. With respect to 
the loosely-coupled domains, MAP-POP attains only 
the simplest tasks, solving from three to seven tasks 
per domain. 

It is difficult to compare the results related to plan 
quality due to the low coverage of MAP-POP. Focus¬ 
ing on the domains in which MAP-POP solves a sig¬ 
nificant number of tasks, we observe that MAP-POP 
obtains slightly better solution plans than FMAP in 
Blocksworld and Satellite. FMAP, however, outperforms 
MAP-POP in Elevators , the domain in which both ap¬ 
proaches solve the largest number of tasks. 

Finally, the results show that FMAP is much faster 
than MAP-POP, from 5 times faster in Logistics to even 
1000 times faster in the Driverlog domain. MAP-POP 
only obtains faster times than FMAP in the seven Satel¬ 
lite tasks. 


5.5 Scalability analysis 

We prepared two additional experiments to analyze the 
ability of FMAP and MAPR to scale up. The first test 
analyzes how both planners scale up when the number 
of agents of a task is increased, keeping the rest of the 
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Fig. 9 Logistics-\ike scalability task 
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Fig. 10 Scalability results for the logistics -like task 



parameters unchanged. More specifically, we designed a 
loosely-coupled logistics-\ike transportation task, which 
is shown in Fig. [9] The basic task includes two different 
trucks, tl and £2. Truck tl moves between locations /I 
and Z2, and truck £2 moves between locations 13 and /4; 
there is no connection between £l’s and £2’s locations. 
The trucks have to transport a total of four packages, 
pl...p4, as shown in Fig [9] In order to ensure that 
MAPR is able to solve the task, both £1 and £2 can 
solve two of the four problem goals by themselves: £1 
will deliver pi and p2, while £2 will transport p3 and 
p4. Therefore, cooperation is not required in this task, 
as opposed to the IPC logistics domain. 

We defined and ran 14 different tests for this basic 
task. In each test, the number of agents in the task is in¬ 
creased by one, ranging from 2 to 15 truck agents. The 
problems are modeled so that the extra truck agents, 
£3...£15, are placed in a separate location /5, from 
which there is no access to the locations that £1 and 
£2 can move through. Therefore, the additional agents 
included in each task are unable to solve any of the 
task goals. However, they do propose refinement plans 
in FMAP (more precisely, they introduce an action to 
move to /6, as shown in Fig. [9|, increasing the com¬ 
plexity of the task in terms of both the number of mes¬ 
sages exchanged and the branching factor of the FMAP 
search tree. 

The plot in Fig. [lO] separately depicts the time re¬ 
quired by each process in FMAP. We show the time 
required by FLEX to generate the refinement plans, 
the time consumed by the Kdtg evaluation procedure, 
and the time spent by agents to communicate and syn- 
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Fig. 11 Scalability results for the satellite task 



chronize, which includes the base plan selection and the 
exchange of plans among agents. Every task was solved 
by FMAP in 14 iterations, resulting in a 12-action so¬ 
lution plan (truck £1 and truck £2 each introduced six 
actions). 

As Fig. [lO] shows, FLEX has a noticeably low im¬ 
pact on the overall execution time. This proves that, 
even when dealing with privacy and building a tree for 
each potentially supportable action, FLEX offers good 
performance and does not limit FMAP’s scalability. 

Even though each task only required 14 iterations to 
be solved, the growing number of agents increases the 
size of the search tree. In the two-agent task, the agents 
generate an average of 3.3 refinement plans per itera¬ 
tion, while in the 15-agent task, the average branching 
factor goes up to 11.8 refinement plans. Nevertheless, 
this does not affect the time consumed by Hdtg, which 
remains relatively constant in all tasks. Since agents 
evaluate plans simultaneously, the evaluation time hardly 
grows when the number of participants increases. 

Fig. [To] confirms that communications among agents 
are the major bottleneck of FMAP. As the number of 
agents increases, so does the branching factor. There¬ 
fore, each agent has to communicate more refinement 
plans to a higher number of participants. Synchronizing 
a larger number of agents is also more complex, which 
increases the number of exchanged messages. All these 
communications are managed by a centralized compo¬ 
nent, the QPid broker, which is negatively affected by 
the communication overhead of the system. 

The behaviour of MAPR remains constant in all 
of the tests, taking about 0.2 seconds to resolve each 
task. Since MAPR does not require communications, 
the growing number of agents does not affect its per¬ 
formance. Note that if we consider only the time spent 
by hoTG (around 0.8 seconds per test) and FLEX (ap¬ 
proximately 0.02 seconds), FMAP execution times are 
quite similar to MAPR. 
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The resolution of this loosely-coupled task does not 
require coordination in order to be able to compare with 
MAPR. However, the coordination mechanism and mes¬ 
sage exchange of FMAP is equally applied to all plan¬ 
ning tasks. Hence, the ability to solve tightly-coupled 
tasks requires great coordination, which is not the case 
for MAPR. 

We performed a second experiment based on the 
satellite domain to assess the scalability of the two plan¬ 
ners when both the number of agents and the number 
of goals increase, thus increasing the complexity of the 
task. We also defined 14 MAP tasks, ranging from 2 
to 15 satellite agents. The simplest task comprises two 
satellite agents, si and s2, which must take an image 
of two different planets. The satellites are configured 
so that each one of them can capture an image of a 
single planet. The instruments they have on board are 
turned on and calibrated, so the agent can directly re¬ 
orient and acquire the image. Unlike the first test, each 
satellite task adds one more goal over the previous task, 
as well as an extra agent. Then, the additional agents, 
s3... si5, must each solve a goal by themselves. This 
increases the branching factor as well as the number of 
iterations for solving a task. 

Fig. [TT] shows the results for this scenario. The solu¬ 
tion plans obtained by FMAP range from 4 actions (in 
the two-agent task) to 30 actions (in the 15-agent task). 
FMAP required 31 iterations to solve the 15-agent task 
and only 4 iterations for the two-agent task. The grow¬ 
ing complexity also affects the average branching factor, 
which ranges from 25.67 to 255.06 plans. 

As Fig [TT] shows, the complexity of the tasks does 
not affect FLEX, which takes less than 0.2 seconds in 
each task. In general, the performance of FLEX only 
decreases when handling very large base plans in do¬ 
mains with many applicable actions. We can therefore 
conclude that FLEX is an efficient and highly scalable 
component of FMAP. 

With regard to the h^TG heuristic, evaluation times 
range from 0.35 seconds for the simplest task to 26.64 
seconds for the most complex one. Although the evalu¬ 
ation time is slightly higher than the generation time, 
we can affirm that this is a good performance consid¬ 
ering that: 1) the branching factor and the number of 
iterations increase from task to task, which results in 
a much larger number of plans to evaluate; and 2) un¬ 
like FLEX, the evaluation function Kdtg also involves 
some communications among agents, which obviously 
increase when the number of agents goes up. All in 
all, and considering just the times of Hdtg and FLEX, 
FMAP is only about 9 times slower in the 15-agent task 
than MAPR, which completes this task in 3 seconds. 


In summary, both tests confirm that communica¬ 
tion overhead is the main issue of FMAP with regard 
to scalability. Communicating plans and synchronizing 
agents are rather costly tasks, especially when dealing 
with complex tasks that combine a large branching fac¬ 
tor and a high number of participating agents. 

5.6 Discussion of the results 

The experimental results support our initial claim: FMAP 
is a domain-independent approach that offers a good 
trade-off between coverage and execution times being 
and is able to solve any typology of MAP task. 

We compared FMAP against two different state-of- 
the-art MAP approaches. On the one hand, MAPR 
is designed as a fast MAP solver. The results show 
that MAPR provides excellent execution times, but its 
performance comes at a cost: it completely rules out 
tightly-coupled domains that require cooperation. Many 
real-world domains, such as logistics or production supply- 
chains, require cooperation between independent enti¬ 
ties. Hence, non-cooperative planners for solving dis¬ 
joint subtasks in which agents can effectively avoid in¬ 
teractions are not suitable for many real-world MAP 
problems. Overall, in the experiments, MAPR solves 
45% of the whole benchmark while FMAP solves 73% 
of the tasks. 

On the other hand, MAP-POP is a general approach 
that is capable of solving any type of planning task 
like FMAP. The approach followed by MAP-POP is 
clearly infiuenced by the use of backward-chaining POP 
technologies and, in particular, by the application of 
low-informative heuristics. This planner offers the worst 
results in terms of coverage and execution times, thus 
indicating that FMAP represents a step ahead in multi¬ 
agent cooperative planning. 

With regard to the scalability tests, it has been 
proved that the FMAP ability to scale up is only af¬ 
fected by communications. While MAPR performance 
is unaltered when the number of agents increases, FMAP 
performance is affected by its heavy dependency on 
agent communications. These results lead us to one of 
our future lines of work, studying techniques to reduce 
overhead communication without losing the ability to 
tackle any kind of MAP task. 

6 Conclusions 

FMAP is a general-purpose MAP model that supports 
inherently distributed domains and defines an advanced 
notion of privacy. Agents in FMAP use an internal 
POP procedure to calculate all possible ways to refine 
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a plan, which guarantees FMAP completeness. Agents 
exchange plans and their evaluations by means of a 
communication mechanism that is governed by a co¬ 
ordinator agent. FMAP exploits the structure of dis¬ 
tributed state-independent domain transition graphs for 
the heuristic evaluation of the plans, thus avoiding hav¬ 
ing to recalculate estimates in each node of the POP 
search tree. 

Privacy is maintained throughout the entire search 
process. Agents only communicate the relevant infor¬ 
mation they share with the rest of the agents. This 
advanced notion of privacy is very useful for model¬ 
ing real-world problems. The experiments show that 
dealing with privacy has a relatively low impact on the 
overall performance of FMAP. 

The exhaustive testing on IPC benchmarks shows 
that FMAP outperforms other state-of-the-art MAP 
frameworks because it is capable of solving tightly-coupled 
domains with specialized agents and cooperative goals 
as well as loosely-coupled problems. The performance 
of FMAP is only affected by the extensive communi¬ 
cations among agents. To the best of our knowledge, 
FMAP is currently likely to be the most competitive 
domain-independent cooperative MAP system. 
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